ˆπ(x) = exp(ˆα + ˆβ T x) 1 + exp(ˆα + ˆβ T.

Size: px
Start display at page:

Download "ˆπ(x) = exp(ˆα + ˆβ T x) 1 + exp(ˆα + ˆβ T."

Transcription

1 Exam 3 Review Suppose that X i = x =(x 1,, x k ) T is observed and that Y i X i = x i independent Binomial(n i,π(x i )) for i =1,, N where ˆπ(x) = exp(ˆα + ˆβ T x) 1 + exp(ˆα + ˆβ T x) This is called the full model for logistic regression and the (k + 1) parameters α, β 1,, β k are estimated For the saturated model, the Y i X i = x i independent Binomial(n i,π i ) for i = 1,, N where ˆπ i = Y i /n i This model estimates the N parameters π i Let l SAT (π 1,, π n ) be the likelihood function for the saturated model and let l FULL (α, β) be the likelihood function for the full model Let L SAT = log l SAT (ˆπ 1,, ˆπ N ) be the log likelihood function for the saturated model evaluated at the MLE (ˆπ 1,, ˆπ N ) and let L FULL = log l FULL (ˆα, ˆβ) be the log likelihood function for the full model evaluated at the MLE (ˆα, ˆβ) Then the deviance D = G 2 = 2(L FULL L SAT ) The degrees of freedom for the deviance = df FULL = N k 1 where N is the number of parameters for the saturated model and k + 1 is the number of parameters for the full model The saturated model is usually not very good for binary data (all n i = 1) or if the n i are small The saturated model can be good if all of the n i are large or if π i is very close to 0 or 1 whenever n i is small If X χ 2 d then E(X) =d and V (X) =2d An observed value of x>d+3 d is unusually large and an observed value of x<d 3 d is unusually small When the saturated model is good, a rule of thumb is that the logistic regression model is ok if G 2 N k 1 (or if G 2 N k 1+3 N k 1) An estimated sufficient summary or ESS plot is a plot of w i =ˆα + ˆβ T x i versus Y i with the logistic curve of fitted proportions ˆπ(w i )= ew i 1+e w i added to the plot along with a step function of observed proportions 29) Suppose that w i takes many values (eg the LR model has a continuous predictor) and that k +1<< N Know that the LR model is good if the step function tracks the logistic curve of fitted proportions in the ESS plot Also know that you should check that the LR model is good before doing inference with the LR model See HW6 4 1

2 Response = Y Terms = (X 1,, X k ) Sequential Analysis of Deviance Total Change Predictor df Deviance df Deviance Ones N 1=df o G 2 o X 1 N 2 1 X 2 N 3 1 X k N k 1=df FULL G 2 FULL Data set = cbrain, Name of Fit = B1 Response = sex Terms = (cephalic size log[size]) Sequential Analysis of Deviance Total Change Predictor df Deviance df Deviance Ones cephalic size log[size] Know how to use the above output for the following test Assume that the ESS plot has been made and that the observed proportions track the logistic curve If the logistic curve looks like a line with small positive slope, then the predictors may not be useful The following test asks whether ˆπ(x i ) from the logistic regression should be used to estimate P (Y i =1 x i ) or if none of the predictors should be used and for all i =1,, N N N P (Y i =1) π Y i / n i i=1 i=1 30) The 4 step (log likelihood) deviance test is i) H o : β 1 = = β k =0 H A : not H o ii) test statistic G 2 (o F )=G 2 o G 2 FULL iii) The p value = P (W > G 2 (o F )) where W χ 2 k has a chi square distribution with k degrees of freedom Note that k = k +1 1 =df o df FULL = N 1 (N k 1) iv) Reject H o if the p value <δand conclude that there is a LR relationship between Y and the predictors X 1,, X k If p value δ, then fail to reject H o and conclude that there is not a LR relationship between Y and the predictors X 1,, X k See HW6 6a 2

3 After obtaining an acceptable full model where logit(π(x i )) = α + β 1 x i1 + + β k x ik = α + β T x, try to obtain a reduced model Y i X Ri = x Ri independent Binomial(n i,π(x Ri )) where logit(π(x Ri )) = α + β R1 x Ri1 + + β Rm x Rim = α R + β T Rx Ri and {x Ri1,, x Rim } {x 1,, x k } Let x R,m+1,, x Rk denote the k m predictors that are in the full model but not in the reduced model We want to test H o : β R,m+1 = = β Rk = 0 For notational ease, we will often assume that the predictors have been sorted and partitioned so that x i = x Ri for i =1,, k Then the reduced model uses predictors x 1,, x m and we test H o : β m+1 = = β k = 0 However, in practice this sorting is usually not done Assume that the ESS plot looks good Then we want to test Ho: the reduced model can be used instead of the full model versus H A : the full model is (significantly) better than the reduced model Fit the full model and the reduced model to get the deviances G 2 FULL and G2 RED 31) The 4 step change in deviance test is i) H o : the reduced model is good H A : use the full model ii) test statistic G 2 (R F )=G 2 RED G2 FULL iii) The p value = P (W >G 2 (R F )) where W χ 2 k m has a chi square distribution with k degrees of freedom Note that k is the number of predictors in the full model while m is the number of predictors in the reduced model Also notice that k m = (k +1) (m +1)=df RED df FULL = N m 1 (N k 1) iv) Reject H o if the p value <δand conclude that the full model is (significantly) better than the reduced model If p value δ, then fail to reject H o and conclude that the reduced model is good See HW6 6b 32) If the reduced model leaves out a single variable X i, then the change in deviance test becomes H o : β i = 0 versus H A : β i 0 This likelihood ratio is a competitor of the Wald test (see 28)) The likelihood ratio test is usually better than the Wald test if the sample size N is not large, but the Wald test is currently easier for software to produce For large N the test statistics from the two test tend to be very similar (asymptotically equivalent tests) 33) If the reduced model is good, then the EE plot of ˆα R + ˆβ T R x Ri versus ˆα + ˆβ T x i should be highly correlated with the identity line with unit slope and zero intercept Know how to use the following output to test the reduced model versus the full model 3

4 Response = Y Terms = (X 1,, X k ) (Full Model) Label Estimate Std Error Est/SE p-value Constant ˆα se(ˆα) z o,0 for Ho: α =0 x 1 ˆβ1 se( ˆβ 1 ) z o,1 = ˆβ 1 /se( ˆβ 1 ) for Ho: β 1 =0 x k ˆβk se( ˆβ k ) z o,k = ˆβ k /se( ˆβ k ) for Ho: β k =0 Degrees of freedom: N - k - 1 = df FULL Deviance: D = G 2 FULL Response = Y Terms = (X 1,, X m ) (Reduced Model) Label Estimate Std Error Est/SE p-value Constant ˆα se(ˆα) z o,0 for Ho: α =0 x 1 ˆβ1 se( ˆβ 1 ) z o,1 = ˆβ 1 /se( ˆβ 1 ) for Ho: β 1 =0 x m ˆβm se( ˆβ m ) z o,m = ˆβ k /se( ˆβ m ) for Ho: β m =0 Degrees of freedom: N - m - 1 = df RED Deviance: D = G 2 RED Data set = Banknotes, Name of Fit = B1 (Full Model) Response = Status Terms = (Diagonal Bottom Top) Coefficient Estimates Label Estimate Std Error Est/SE p-value Constant Diagonal Bottom Top Degrees of freedom: 196 Deviance: 0009 Data set = Banknotes, Name of Fit = B2 (Reduced Model) Response = Status Terms = (Diagonal) Coefficient Estimates Label Estimate Std Error Est/SE p-value Constant Diagonal Degrees of freedom: 198 Deviance:

5 34) Let π(x) = P (success x) = 1 P(failure x) where a success is what is counted and a failure is what is not counted (so if the Y i are binary, π(x) =P (Y i =1 x)) Then the estimated odds of success is ˆΩ(x) = ˆπ(x) 1 ˆπ(x) 35) In logistic regression, increasing a predictor x i by 1 unit (while holding all other predictors fixed) multiplies the estimated odds of success by a factor of exp( ˆβ i ) 36) Suppose that the binary response variable Y is conditionally independent of x given a single linear combination β T x of the predictors, written Y x β T x If the LR model holds and if the first SIR predictor ˆβ T SIR1 x and ˆα + ˆβ T x are highly correlated, then ( to a good approximation) Y x ˆα + ˆβ T x and Y x ˆβ T SIR1 x To make a binary response plot for logistic regression, fit SIR and the LR model and assume that the above conditions hold Place the first SIR predictor on the horizontal axis and the 2nd SIR predictor ˆβ T SIR2x on the vertical axis If Y = 0 use symbol 0 and if Y = 1 use symbol X If the LR model is good then consider the symbol density of X s and 0 s in a narrow vertical slice This symbol density should be approximately constant (up to binomial variation) from the bottom to the top of the slice (Hence the X s and 0 s should be mixed in the slice) The symbol density may change greatly as the slice is moved from the left to the right of the plot, eg from 0% to 100% If there are slices where the symbol density is not constant from top to bottom, then the LR model may not be good (eg a more complicated model may be needed) 37) Given a predictor x, sometimes x is not used by itself in the full LR model Suppose that Y is binary Then to decide what functions of x should be in the model, look at the conditional distribution of x Y = i for i =0, 1 These rules are used if x is an indicator variable or if x is a continuous variable distribution of x y = i functions of x to include in the full LR model x y = i is an indicator x x y = i N(µ i,σ 2 ) x x y = i N(µ i,σi 2 ) x and x 2 x y = i has a skewed distribution x and log(x) x y = i has support on (0,1) log(x) and log(1 x) 38) If w is a nominal variable with J levels, use J 1 (indicator or) dummy variables x 1,w,, x J 1,w in the full model 39) An interaction is a product of two or more predictor variables Interactions are difficult to interpret Often interactions are included in the full model and the reduced model without any interactions is tested The investigator is hoping that the interactions are not needed 5

6 40) A scatterplot of x vs Y is used to visualize the conditional distribution of Y x A scatterplot matrix is an array of scatterplots It is used to examine the marginal relationships of the predictors and response Place Y on the top or bottom of the scatterplot matrix and also mark the plotted points by a 0 if Y = 0 and by X if Y =1 Variables with outliers, missing values or strong nonlinearities may be so bad that they should not be included in the full model 41) Suppose that all values of the variable x are positive The log rule says add log(x) to the full model if max(x i )/ min(x i ) > 10 42) To make a full model, use points 37), 38), 40) and 41) and sometimes 39) The number of predictors in the full model should be much smaller than the number of data cases N Make an ESS plot to check that the full model is good 43) Variable selection is closely related to the change in deviance test for a reduced model You are seeking a subset I of the variables to keep in the model The AIC(I) statistic is used as an aid in backward elimination and forward selection The full model and the model with the smallest AIC are always of interest Create a full model The full model has a deviance at least as small as that of any submodel 44) Backward elimination starts with the full model with k variables and the predictor that optimizes some criterion is deleted Then there are k 1 variables left and the predictor that optimizes some criterion is deleted This process continues for models with k 2,k 3,, 3 and 2 predictors Forward selection starts with the model with 0 variables and the predictor that optimizes some criterion is added Then there is 1 variable in the model and the predictor that optimizes some criterion is added This process continues for models with 2, 3,, k 2 and k 1 predictors Both forward selection and backward elimination result in a sequence of k models {x 1 }, {x 1,x 2 },, {x 1,x 2,, x k 1 }, {x 1,x 2,, x k } = full model 45) For logistic regression, suppose that the Y i are binary for i = 1,, N Let N 1 = Y i = the number of 1 s and N 0 = N N 1 = the number of 0 s Rule of thumb: the final submodel should have m predictor variables where m is small with m min(n 1,N 0 )/10 46) Know how to find good models from output A good submodel I will use a small number of predictors, have a good ESS plot, and have a good EE plot A good LR submodel I should have a deviance G 2 (I) close to that of the full model in that the change in deviance test 31) would not be rejected Also the submodel should have a value of AIC(I) close to that of the examined model that has the minimum AIC value The LR output for model I should not have many variables with small Wald test p values 47) Heuristically, backward elimination tries to delete the variable that will increase the deviance the least An increase in deviance greater than 4 (if the predictor has 1 degree of freedom) may be troubling in that a good predictor may have been deleted In practice, the backward elimination program may delete the variable such that the submodel I with j predictors has 1) the smallest AIC(I), 2) the smallest deviance G 2 (I) or 3) the biggest p value (preferably from a change in deviance test but possibly from a Wald test) in the test Ho β i = 0 versus H A β i 0 where the current model with j +1 6

7 variables is treated as the full model 48) Heuristically, forward selection tries to add the variable that will decrease the deviance the most An increase in deviance less than 4 (if the predictor has 1 degree of freedom) may be troubling in that a bad predictor may have been added In practice, the forward selection program may add the variable such that the submodel I with j predictors has 1) the smallest AIC(I), 2) the smallest deviance G 2 (I) or 3) the smallest p value (preferably from a change in deviance test but possibly from a Wald test) in the test Ho β i = 0 versus H A β i 0 where the current model with j terms plus the predictor x i is treated as the full model (for all variables x i not yet in the model) 49) For logistic regression, let N 1 = number of ones and N 0 = N N 1 = number of zeroes A rough rule of thumb is that the full model should use no more than min(n 0,N 1 )/5 predictors and the final submodel should use no more than min(n 0,N 1 )/10 predictors 50) For loglinear regression, a rough rule of thumb is that the full model should use no more than N/5 predictors and the final submodel should use no more than N/10 predictors 51) Variable selection is pretty much the same for logistic regression and loglinear regression Suppose that the full model is good and is stored in M1 Let M2, M3, M4, and M5 be candidate submodels found after forward selection, backward elimination, etc Make a scatterplot matrix of M2, M3, M4, M5 and M1 Good candidates should have estimated linear predictors that are highly correlated with the full model estimated linear predictor (the correlation should be at least 09 and preferably greater than 095) For binary logistic regression, mark the symbols using the response variable Y See HW7 1, HW8 1, HW9 1 and HW ) The final submodel I should have few predictors, few variables with large Wald p values (001 to 005 is borderline), a good ESS plot and an EE plot that clusters tightly about the identity line Do not use more predictors than the min AIC model I min and want AIC(I) AIC(I min ) + 7 For the change in deviance test, want pvalue 001 for variable selection (instead of δ = 005) If a factor has J-1 dummy variables, either keep all I-1 dummy variables or delete all J-1 dummy variables, do not delete some of the dummy variables 53) Know that when there is perfect classification in the binary logistic regression model, the LR MLE estimator does not exist and the output is suspect However, often the full model deviance is close to 0 and the deviance test correctly rejects Ho 54) Suppose that X i = x =(x 1,, x k ) T is observed and that Y i X i = x i independent Poisson(µ(x i )) for i =1,, N where ˆµ(x) = exp(ˆα + ˆβ T x) This is called the full model for loglinear regression and the (k + 1) parameters α, β 1,, β k are estimated Know how to predict ˆµ(x) Also Ŷ =ˆµ(x) See HW9 2, Q8 7

8 For the saturated model, the Y i X i = x i independent Poisson(µ i ) for i =1,, N where ˆµ i = Y i This model estimates the N parameters µ i The saturated model is usually bad An exception is when all NY i are large The comments on the deviance in the middle of p 1 still hold 55) An estimated sufficient summary or EY plot is a plot of w i =ˆα + ˆβ T x i versus Y i with the exponential curve of estimated means ˆµ(w i )=e w i added to the plot along with a lowess curve 56) Suppose that w i takes many values (eg the LLR model has a continuous predictor) and that k +1 << N Know that the LLR model is good if the lowess tracks the exponential curve of estimated means in the ESS plot Also know that you should check that the LLR model is good before doing inference with the LLR model See HW9 2 57) Know how to perform the 4 step deviance test This test is almost exactly the same as that in 30), but replace LR by LLR in the conclusion The output looks almost like that shown on p 2 See HW9 2, Q8 The deviance test for LLR asks whether ˆµ(x i ) from LLR should be used to estimate µ(x i ) or should none of the predictors be used so ˆµ = Y = N i=1 Y i /N 58) Know how to perform the 4 step Wald test This test is the same as 28) except replace LR by LLR 59) Know that a (Wald) 95% CI for β i is ˆβ i ± 196SE( ˆβ i ) 60) Know how to perform the 4 step change in deviance test The output is almost the same as that on p 4 and the test is exactly the same as that given in 31) For Ho, the parameters set to 0 are those that are in the full model but not the reduced model 61) Know what a lurking variable is 62) Know the difference between an observational study and an experiment A clinical trial is a randomized controlled experiment performed on humans Exam 3 is on Wednesday, April 19 and covers Agresti material including points 23) through 28) on the Exam 2 review 7 pages of notes You should know how to use a random number table to draw a simple random sample in order to divide units into 2 groups In Agresti, we have covered ch 1, 21, 22, 23, 24, 51, 52, 53, 54, and 55 but have skipped subsections 245, 246, 247, 533, 534, and 556 8

The material for categorical data follows Agresti closely.

The material for categorical data follows Agresti closely. Exam 2 is Wednesday March 8 4 sheets of notes The material for categorical data follows Agresti closely A categorical variable is one for which the measurement scale consists of a set of categories Categorical

More information

Definition The Poisson regression model states that Y 1,..., Y n are independent random variables with

Definition The Poisson regression model states that Y 1,..., Y n are independent random variables with Chapter 11 Poisson Regression If the response variable Y is a count, then the Poisson regression model is often useful. For example, counts often occur in wildlife studies where a region is divided into

More information

Definition The binary regression model states that Y 1,..., Y n are independent random variables with

Definition The binary regression model states that Y 1,..., Y n are independent random variables with Chapter 10 Logistic Regression Multiple linear regression is used when the response variable is quantitative, but for many data sets the response variable is categorical and takes on two values: 0 or 1.

More information

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: ) NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

STAT 7030: Categorical Data Analysis

STAT 7030: Categorical Data Analysis STAT 7030: Categorical Data Analysis 5. Logistic Regression Peng Zeng Department of Mathematics and Statistics Auburn University Fall 2012 Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall 2012

More information

9 Generalized Linear Models

9 Generalized Linear Models 9 Generalized Linear Models The Generalized Linear Model (GLM) is a model which has been built to include a wide range of different models you already know, e.g. ANOVA and multiple linear regression models

More information

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION

More information

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3 4 5 6 Full marks

More information

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46 A Generalized Linear Model for Binomial Response Data Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 1 / 46 Now suppose that instead of a Bernoulli response, we have a binomial response

More information

Exam Applied Statistical Regression. Good Luck!

Exam Applied Statistical Regression. Good Luck! Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.

More information

Categorical data analysis Chapter 5

Categorical data analysis Chapter 5 Categorical data analysis Chapter 5 Interpreting parameters in logistic regression The sign of β determines whether π(x) is increasing or decreasing as x increases. The rate of climb or descent increases

More information

Logistic Regressions. Stat 430

Logistic Regressions. Stat 430 Logistic Regressions Stat 430 Final Project Final Project is, again, team based You will decide on a project - only constraint is: you are supposed to use techniques for a solution that are related to

More information

Homework 5: Answer Key. Plausible Model: E(y) = µt. The expected number of arrests arrests equals a constant times the number who attend the game.

Homework 5: Answer Key. Plausible Model: E(y) = µt. The expected number of arrests arrests equals a constant times the number who attend the game. EdPsych/Psych/Soc 589 C.J. Anderson Homework 5: Answer Key 1. Probelm 3.18 (page 96 of Agresti). (a) Y assume Poisson random variable. Plausible Model: E(y) = µt. The expected number of arrests arrests

More information

Answer Key for STAT 200B HW No. 8

Answer Key for STAT 200B HW No. 8 Answer Key for STAT 200B HW No. 8 May 8, 2007 Problem 3.42 p. 708 The values of Ȳ for x 00, 0, 20, 30 are 5/40, 0, 20/50, and, respectively. From Corollary 3.5 it follows that MLE exists i G is identiable

More information

Chapter 5: Logistic Regression-I

Chapter 5: Logistic Regression-I : Logistic Regression-I Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population

More information

Homework 1 Solutions

Homework 1 Solutions 36-720 Homework 1 Solutions Problem 3.4 (a) X 2 79.43 and G 2 90.33. We should compare each to a χ 2 distribution with (2 1)(3 1) 2 degrees of freedom. For each, the p-value is so small that S-plus reports

More information

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches Sta 216, Lecture 4 Last Time: Logistic regression example, existence/uniqueness of MLEs Today s Class: 1. Hypothesis testing through analysis of deviance 2. Standard errors & confidence intervals 3. Model

More information

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010 1 Linear models Y = Xβ + ɛ with ɛ N (0, σ 2 e) or Y N (Xβ, σ 2 e) where the model matrix X contains the information on predictors and β includes all coefficients (intercept, slope(s) etc.). 1. Number of

More information

Single-level Models for Binary Responses

Single-level Models for Binary Responses Single-level Models for Binary Responses Distribution of Binary Data y i response for individual i (i = 1,..., n), coded 0 or 1 Denote by r the number in the sample with y = 1 Mean and variance E(y) =

More information

ST3241 Categorical Data Analysis I Logistic Regression. An Introduction and Some Examples

ST3241 Categorical Data Analysis I Logistic Regression. An Introduction and Some Examples ST3241 Categorical Data Analysis I Logistic Regression An Introduction and Some Examples 1 Business Applications Example Applications The probability that a subject pays a bill on time may use predictors

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

Sections 4.1, 4.2, 4.3

Sections 4.1, 4.2, 4.3 Sections 4.1, 4.2, 4.3 Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1/ 32 Chapter 4: Introduction to Generalized Linear Models Generalized linear

More information

8 Nominal and Ordinal Logistic Regression

8 Nominal and Ordinal Logistic Regression 8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on

More information

UNIVERSITY OF TORONTO Faculty of Arts and Science

UNIVERSITY OF TORONTO Faculty of Arts and Science UNIVERSITY OF TORONTO Faculty of Arts and Science December 2013 Final Examination STA442H1F/2101HF Methods of Applied Statistics Jerry Brunner Duration - 3 hours Aids: Calculator Model(s): Any calculator

More information

Various Issues in Fitting Contingency Tables

Various Issues in Fitting Contingency Tables Various Issues in Fitting Contingency Tables Statistics 149 Spring 2006 Copyright 2006 by Mark E. Irwin Complete Tables with Zero Entries In contingency tables, it is possible to have zero entries in a

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) (b) (c) (d) (e) In 2 2 tables, statistical independence is equivalent

More information

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses ST3241 Categorical Data Analysis I Multicategory Logit Models Logit Models For Nominal Responses 1 Models For Nominal Responses Y is nominal with J categories. Let {π 1,, π J } denote the response probabilities

More information

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Logistic Regression 1 / 38 Logistic Regression 1 Introduction

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n

More information

Multinomial Logistic Regression Models

Multinomial Logistic Regression Models Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word

More information

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples ST3241 Categorical Data Analysis I Generalized Linear Models Introduction and Some Examples 1 Introduction We have discussed methods for analyzing associations in two-way and three-way tables. Now we will

More information

Confidence Intervals, Testing and ANOVA Summary

Confidence Intervals, Testing and ANOVA Summary Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0

More information

Lecture 14: Introduction to Poisson Regression

Lecture 14: Introduction to Poisson Regression Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why

More information

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week

More information

Generalized Linear Models

Generalized Linear Models Chapter 13 Generalized Linear Models 13.1 Introduction Generalized linear models are an important class of parametric 1D regression models that include multiple linear regression, logistic regression and

More information

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While

More information

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models Introduction to generalized models Models for binary outcomes Interpreting parameter

More information

Linear Regression Models P8111

Linear Regression Models P8111 Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started

More information

Introduction to the Logistic Regression Model

Introduction to the Logistic Regression Model CHAPTER 1 Introduction to the Logistic Regression Model 1.1 INTRODUCTION Regression methods have become an integral component of any data analysis concerned with describing the relationship between a response

More information

Lecture 4 Multiple linear regression

Lecture 4 Multiple linear regression Lecture 4 Multiple linear regression BIOST 515 January 15, 2004 Outline 1 Motivation for the multiple regression model Multiple regression in matrix notation Least squares estimation of model parameters

More information

10. Alternative case influence statistics

10. Alternative case influence statistics 10. Alternative case influence statistics a. Alternative to D i : dffits i (and others) b. Alternative to studres i : externally-studentized residual c. Suggestion: use whatever is convenient with the

More information

An ordinal number is used to represent a magnitude, such that we can compare ordinal numbers and order them by the quantity they represent.

An ordinal number is used to represent a magnitude, such that we can compare ordinal numbers and order them by the quantity they represent. Statistical Methods in Business Lecture 6. Binomial Logistic Regression An ordinal number is used to represent a magnitude, such that we can compare ordinal numbers and order them by the quantity they

More information

Classification. Chapter Introduction. 6.2 The Bayes classifier

Classification. Chapter Introduction. 6.2 The Bayes classifier Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode

More information

Review of One-way Tables and SAS

Review of One-way Tables and SAS Stat 504, Lecture 7 1 Review of One-way Tables and SAS In-class exercises: Ex1, Ex2, and Ex3 from http://v8doc.sas.com/sashtml/proc/z0146708.htm To calculate p-value for a X 2 or G 2 in SAS: http://v8doc.sas.com/sashtml/lgref/z0245929.htmz0845409

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

Simple logistic regression

Simple logistic regression Simple logistic regression Biometry 755 Spring 2009 Simple logistic regression p. 1/47 Model assumptions 1. The observed data are independent realizations of a binary response variable Y that follows a

More information

Stat 579: Generalized Linear Models and Extensions

Stat 579: Generalized Linear Models and Extensions Stat 579: Generalized Linear Models and Extensions Yan Lu Jan, 2018, week 3 1 / 67 Hypothesis tests Likelihood ratio tests Wald tests Score tests 2 / 67 Generalized Likelihood ratio tests Let Y = (Y 1,

More information

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form: Outline for today What is a generalized linear model Linear predictors and link functions Example: fit a constant (the proportion) Analysis of deviance table Example: fit dose-response data using logistic

More information

Correlation and regression

Correlation and regression 1 Correlation and regression Yongjua Laosiritaworn Introductory on Field Epidemiology 6 July 2015, Thailand Data 2 Illustrative data (Doll, 1955) 3 Scatter plot 4 Doll, 1955 5 6 Correlation coefficient,

More information

Generalized Linear Models: An Introduction

Generalized Linear Models: An Introduction Applied Statistics With R Generalized Linear Models: An Introduction John Fox WU Wien May/June 2006 2006 by John Fox Generalized Linear Models: An Introduction 1 A synthesis due to Nelder and Wedderburn,

More information

Solutions for Examination Categorical Data Analysis, March 21, 2013

Solutions for Examination Categorical Data Analysis, March 21, 2013 STOCKHOLMS UNIVERSITET MATEMATISKA INSTITUTIONEN Avd. Matematisk statistik, Frank Miller MT 5006 LÖSNINGAR 21 mars 2013 Solutions for Examination Categorical Data Analysis, March 21, 2013 Problem 1 a.

More information

Section 4.6 Simple Linear Regression

Section 4.6 Simple Linear Regression Section 4.6 Simple Linear Regression Objectives ˆ Basic philosophy of SLR and the regression assumptions ˆ Point & interval estimation of the model parameters, and how to make predictions ˆ Point and interval

More information

Analysing categorical data using logit models

Analysing categorical data using logit models Analysing categorical data using logit models Graeme Hutcheson, University of Manchester The lecture notes, exercises and data sets associated with this course are available for download from: www.research-training.net/manchester

More information

1/15. Over or under dispersion Problem

1/15. Over or under dispersion Problem 1/15 Over or under dispersion Problem 2/15 Example 1: dogs and owners data set In the dogs and owners example, we had some concerns about the dependence among the measurements from each individual. Let

More information

STATISTICS 110/201 PRACTICE FINAL EXAM

STATISTICS 110/201 PRACTICE FINAL EXAM STATISTICS 110/201 PRACTICE FINAL EXAM Questions 1 to 5: There is a downloadable Stata package that produces sequential sums of squares for regression. In other words, the SS is built up as each variable

More information

Introduction to Statistical modeling: handout for Math 489/583

Introduction to Statistical modeling: handout for Math 489/583 Introduction to Statistical modeling: handout for Math 489/583 Statistical modeling occurs when we are trying to model some data using statistical tools. From the start, we recognize that no model is perfect

More information

Multiple linear regression S6

Multiple linear regression S6 Basic medical statistics for clinical and experimental research Multiple linear regression S6 Katarzyna Jóźwiak k.jozwiak@nki.nl November 15, 2017 1/42 Introduction Two main motivations for doing multiple

More information

Logistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20

Logistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20 Logistic regression 11 Nov 2010 Logistic regression (EPFL) Applied Statistics 11 Nov 2010 1 / 20 Modeling overview Want to capture important features of the relationship between a (set of) variable(s)

More information

Log-linear Models for Contingency Tables

Log-linear Models for Contingency Tables Log-linear Models for Contingency Tables Statistics 149 Spring 2006 Copyright 2006 by Mark E. Irwin Log-linear Models for Two-way Contingency Tables Example: Business Administration Majors and Gender A

More information

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Thursday, August 30, 2018

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Thursday, August 30, 2018 UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Thursday, August 30, 2018 Work all problems. 60 points are needed to pass at the Masters Level and 75

More information

BMI 541/699 Lecture 22

BMI 541/699 Lecture 22 BMI 541/699 Lecture 22 Where we are: 1. Introduction and Experimental Design 2. Exploratory Data Analysis 3. Probability 4. T-based methods for continous variables 5. Power and sample size for t-based

More information

Solutions to the Spring 2018 CAS Exam MAS-1

Solutions to the Spring 2018 CAS Exam MAS-1 !! Solutions to the Spring 2018 CAS Exam MAS-1 (Incorporating the Final CAS Answer Key) There were 45 questions in total, of equal value, on this 4 hour exam. There was a 15 minute reading period in addition

More information

Generalized Linear Models

Generalized Linear Models York SPIDA John Fox Notes Generalized Linear Models Copyright 2010 by John Fox Generalized Linear Models 1 1. Topics I The structure of generalized linear models I Poisson and other generalized linear

More information

Poisson regression 1/15

Poisson regression 1/15 Poisson regression 1/15 2/15 Counts data Examples of counts data: Number of hospitalizations over a period of time Number of passengers in a bus station Blood cells number in a blood sample Number of typos

More information

THE PEARSON CORRELATION COEFFICIENT

THE PEARSON CORRELATION COEFFICIENT CORRELATION Two variables are said to have a relation if knowing the value of one variable gives you information about the likely value of the second variable this is known as a bivariate relation There

More information

Figure 36: Respiratory infection versus time for the first 49 children.

Figure 36: Respiratory infection versus time for the first 49 children. y BINARY DATA MODELS We devote an entire chapter to binary data since such data are challenging, both in terms of modeling the dependence, and parameter interpretation. We again consider mixed effects

More information

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression Logistic Regression Usual linear regression (repetition) y i = b 0 + b 1 x 1i + b 2 x 2i + e i, e i N(0,σ 2 ) or: y i N(b 0 + b 1 x 1i + b 2 x 2i,σ 2 ) Example (DGA, p. 336): E(PEmax) = 47.355 + 1.024

More information

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 Introduction to Generalized Univariate Models: Models for Binary Outcomes EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 EPSY 905: Intro to Generalized In This Lecture A short review

More information

11. Generalized Linear Models: An Introduction

11. Generalized Linear Models: An Introduction Sociology 740 John Fox Lecture Notes 11. Generalized Linear Models: An Introduction Copyright 2014 by John Fox Generalized Linear Models: An Introduction 1 1. Introduction I A synthesis due to Nelder and

More information

22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression

22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression 22s:52 Applied Linear Regression Ch. 4 (sec. and Ch. 5 (sec. & 4: Logistic Regression Logistic Regression When the response variable is a binary variable, such as 0 or live or die fail or succeed then

More information

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014 LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R Liang (Sally) Shan Nov. 4, 2014 L Laboratory for Interdisciplinary Statistical Analysis LISA helps VT researchers

More information

STAT 525 Fall Final exam. Tuesday December 14, 2010

STAT 525 Fall Final exam. Tuesday December 14, 2010 STAT 525 Fall 2010 Final exam Tuesday December 14, 2010 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points will

More information

Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links

Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links Communications of the Korean Statistical Society 2009, Vol 16, No 4, 697 705 Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links Kwang Mo Jeong a, Hyun Yung Lee 1, a a Department

More information

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation Yujin Chung November 29th, 2016 Fall 2016 Yujin Chung Lec13: MLE Fall 2016 1/24 Previous Parametric tests Mean comparisons (normality assumption)

More information

Unit 11: Multiple Linear Regression

Unit 11: Multiple Linear Regression Unit 11: Multiple Linear Regression Statistics 571: Statistical Methods Ramón V. León 7/13/2004 Unit 11 - Stat 571 - Ramón V. León 1 Main Application of Multiple Regression Isolating the effect of a variable

More information

Administration. Homework 1 on web page, due Feb 11 NSERC summer undergraduate award applications due Feb 5 Some helpful books

Administration. Homework 1 on web page, due Feb 11 NSERC summer undergraduate award applications due Feb 5 Some helpful books STA 44/04 Jan 6, 00 / 5 Administration Homework on web page, due Feb NSERC summer undergraduate award applications due Feb 5 Some helpful books STA 44/04 Jan 6, 00... administration / 5 STA 44/04 Jan 6,

More information

Cohen s s Kappa and Log-linear Models

Cohen s s Kappa and Log-linear Models Cohen s s Kappa and Log-linear Models HRP 261 03/03/03 10-11 11 am 1. Cohen s Kappa Actual agreement = sum of the proportions found on the diagonals. π ii Cohen: Compare the actual agreement with the chance

More information

Model Selection in GLMs. (should be able to implement frequentist GLM analyses!) Today: standard frequentist methods for model selection

Model Selection in GLMs. (should be able to implement frequentist GLM analyses!) Today: standard frequentist methods for model selection Model Selection in GLMs Last class: estimability/identifiability, analysis of deviance, standard errors & confidence intervals (should be able to implement frequentist GLM analyses!) Today: standard frequentist

More information

9. Linear Regression and Correlation

9. Linear Regression and Correlation 9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,

More information

MS-C1620 Statistical inference

MS-C1620 Statistical inference MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents

More information

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3 STA 303 H1S / 1002 HS Winter 2011 Test March 7, 2011 LAST NAME: FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 303 STA 1002 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator. Some formulae

More information

REVISED PAGE PROOFS. Logistic Regression. Basic Ideas. Fundamental Data Analysis. bsa350

REVISED PAGE PROOFS. Logistic Regression. Basic Ideas. Fundamental Data Analysis. bsa350 bsa347 Logistic Regression Logistic regression is a method for predicting the outcomes of either-or trials. Either-or trials occur frequently in research. A person responds appropriately to a drug or does

More information

Unit 6 - Introduction to linear regression

Unit 6 - Introduction to linear regression Unit 6 - Introduction to linear regression Suggested reading: OpenIntro Statistics, Chapter 7 Suggested exercises: Part 1 - Relationship between two numerical variables: 7.7, 7.9, 7.11, 7.13, 7.15, 7.25,

More information

Three-Way Tables (continued):

Three-Way Tables (continued): STAT5602 Categorical Data Analysis Mills 2015 page 110 Three-Way Tables (continued) Now let us look back over the br preference example. We have fitted the following loglinear models 1.MODELX,Y,Z logm

More information

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages

More information

Model Estimation Example

Model Estimation Example Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions

More information

Regression modeling for categorical data. Part II : Model selection and prediction

Regression modeling for categorical data. Part II : Model selection and prediction Regression modeling for categorical data Part II : Model selection and prediction David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625 http://math.agrocampus-ouest.fr/infogluedeliverlive/membres/david.causeur

More information

Section Poisson Regression

Section Poisson Regression Section 14.13 Poisson Regression Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Data Analysis II 1 / 26 Poisson regression Regular regression data {(x i, Y i )} n i=1,

More information

Answer Key for STAT 200B HW No. 7

Answer Key for STAT 200B HW No. 7 Answer Key for STAT 200B HW No. 7 May 5, 2007 Problem 2.2 p. 649 Assuming binomial 2-sample model ˆπ =.75, ˆπ 2 =.6. a ˆτ = ˆπ 2 ˆπ =.5. From Ex. 2.5a on page 644: ˆπ ˆπ + ˆπ 2 ˆπ 2.75.25.6.4 = + =.087;

More information

Regression Diagnostics for Survey Data

Regression Diagnostics for Survey Data Regression Diagnostics for Survey Data Richard Valliant Joint Program in Survey Methodology, University of Maryland and University of Michigan USA Jianzhu Li (Westat), Dan Liao (JPSM) 1 Introduction Topics

More information

Statistics in medicine

Statistics in medicine Statistics in medicine Lecture 4: and multivariable regression Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease Epidemiology Department Yale School of Public Health Fatma.shebl@yale.edu

More information

Multiple Logistic Regression for Dichotomous Response Variables

Multiple Logistic Regression for Dichotomous Response Variables Multiple Logistic Regression for Dichotomous Response Variables Edps/Psych/Soc 589 Carolyn J. Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Fall 2018 Outline

More information

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics Exploring Data: Distributions Look for overall pattern (shape, center, spread) and deviations (outliers). Mean (use a calculator): x = x 1 + x

More information

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B Simple Linear Regression 35 Problems 1 Consider a set of data (x i, y i ), i =1, 2,,n, and the following two regression models: y i = β 0 + β 1 x i + ε, (i =1, 2,,n), Model A y i = γ 0 + γ 1 x i + γ 2

More information

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model 1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor

More information

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression 1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25

More information

Ch 6: Multicategory Logit Models

Ch 6: Multicategory Logit Models 293 Ch 6: Multicategory Logit Models Y has J categories, J>2. Extensions of logistic regression for nominal and ordinal Y assume a multinomial distribution for Y. In R, we will fit these models using the

More information

Chapter 4: Generalized Linear Models-I

Chapter 4: Generalized Linear Models-I : Generalized Linear Models-I Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay

More information